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We present an efficient, principled, and interpretable technique for inferring module assignments 
and for identifying the optimal number of modules in a given network. We show how several existing 
methods for finding modules can be described as variant, special, or limiting cases of our work, and 
how the method overcomes the resolution limit problem, accurately recovering the true number of 
modules. Our approach is based on Bayesian methods for model selection which have been used 
with success for almost a century, implemented using a variational technique developed only in the 
past decade. We apply the technique to synthetic and real networks and outline how the method 
naturally allows selection among competing models. 



Large-scale networks describing complex interactions 
among a multitude of objects have found application in 
a wide array of fields, from biology to social science to in- 
formation technology [TJ [5] . In these applications one of- 
ten wishes to morfeZ networks, suppressing the complexity 
of the full description while retaining relevant informa- 
tion about the structure of the interactions |3j . One such 
network model groups nodes into modules, or "commu- 
nities," with different densities of intra- and inter- con- 
nectivity for nodes in the same or different modules. We 
present here a computationally efficient Bayesian frame- 
work for inferring the number of modules, model param- 
eters, and module assignments for such a model. 

The problem of finding modules in networks (or "com- 
munity detection") has received much attention in the 
physics literature, wherein many approaches |H |5] focus 
on optimizing an energy-based cost function with fixed 
parameters over possible assignments of nodes into mod- 
ules. The particular cost functions vary, but most com- 
pare a given node partitioning to an implicit null model, 
the two most popular being the configuration model and 
a limited version of the stochastic block model (SBM) 
[SJ [7]. While much effort has gone into how to optimize 
these cost functions, less attention has been paid to what 
is to be optimized. In recent studies which emphasize the 
importance of the latter question it was shown that there 
are inherent problems with existing approaches regard- 
less of how optimization is performed, wherein parameter 
choice sets a lower limit on the size of detected modules, 
referred to as the "resolution limit" problem [8| |9J . We 
extend recent probabilistic treatments of modular net- 
works [lOj [11] to develop a solution to this problem that 
relies on inferring distributions over the model parame- 
ters, as opposed to asserting parameter values a priori, 
to determine the modular structure of a given network. 
The developed techniques are principled, interpretable, 
computationally efficient, and can be shown to general- 
ize several previous studies on module detection. 

We specify an iV-node network by its adjacency matrix 



A, where Aij = 1 if there is an edge between nodes i and 
j and Aij = otherwise, and define ai € {1, . . . , K} to 
be the unobserved module membership of the i"^ node. 
We use a constrained SBM, which consists of a multino- 
mial distribution over module assignments with weights 
TT^ = p{<7i = /i|7r) and Bernoulli distributions over edges 
contained within and between modules with weights "dc = 
p{A^j = l|(Tj = aj,i}) and -d^ = p{A^j = l\ai ^ cTj,^), 
respectively. In short, to generate a random undirected 
graph under this model we roll a ii'-sided die (biased by 
71^) N times to determine module assignments for each of 
the N nodes; we then fiip one of two biased coins (for ei- 
ther intra- or inter- module connection, biased by dc, '&di 
respectively) for each of the N{N — l)/2 pairs of nodes 
to determine if the pair is connected. The extension to 
directed graphs is straightforward. 

Using this model, we write the joint probability 
p{A,(T\Tf,{}, K) = p{A\(7,'d)p{a\Tr) (conditional depen- 
dence on K has been suppressed below for brevity) as 

K 

p{A,a\7f,d) ^ d^Hi - d^r-r/ii ~ i[n;^ (i) 

where c_|_ = X]i>j ^ij'^o'i.o-j is the number of edges con- 
tained within communities, c_ = J2i>ji^ ~ Aij)Sa-i,aj is 
the number of non-edges contained within communities, 
d+ = J2i>j ^ii(l~<^cri,crj) IS the number of edges between 
different communities, d- = J2i>ji^ ~ ~ ^cri,crj) is 

the number of non-edges between different communities, 
and = X]il=i ^o-i.M ^^"^ occupation number of the /i"^ 
module. Defining H = — lnp(A, (t|7i^, i?) and regrouping 
terms by local and global counts, we recover (up to ad- 
ditive constants) a generalized version of pUj : 

K N 

W - - I] {JLA^J - Jg) 5a^,a, +Y.^^^Y. '^-•.'^ (2) 

a Potts model Hamiltonian with unknown coupling con- 
stants Jg =ln(l-i?d)/(l-i9c), Jl = ln^?c/'?d + Jg, and 
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chemical potentials /i^ = — Invr^. (Note that many pre- 
vious methods omit a chemical potential term, implicitly 
assuming equally-sized groups.) 

While previous approaches [H [TH] minimize related 
Hamiltonians as a function of a, these methods require 
that the user specifies values for these unknown con- 
stants, which gives rise to the resolution limit problem 
[HJIH]. Our approach, however, uses a disorder-averaged 
calculation to infer distributions over these parameters, 
avoiding this issue. To do so, we take beta (B) and 
Dirichlet (T>) distributions over ?? and tt, respectively: 

P0)p{7f) = 6(7?,;5+„,£_J6(z9rf; J+„,J_„)I?(7f;fio). (3) 

These conjugate prior distributions, are defined on the 
full range of d and if, respectively, and their functional 
forms are preserved when integrated against the model to 
obtain updated parameter distributions. Their hyperpa- 
rameters {5+^ , c_„ , d+g , d^^ , no} act as pseudocounts that 
augment observed edge counts and occupation numbers. 

In this framework the problem of module detection can 
be stated as follows: given an adjacency matrix A, deter- 
mine the most probable number of modules (i.e. occupied 
spin states) K* = argmax^ p{K\A) and infer posterior 
distributions over the model parameters (i.e. coupling 
constants and chemical potentials) p(7f, i?| A) and the la- 
tent module assignments (i.e. spin states) p{d\A). In the 
absence of a priori belief about the number of modules, 
we demand that p{K) is sufficiently weak that maximiz- 
ing p{K\K) oc p{A\K)p{K) is equivalent to maximizing 
p{A\K), referred to as the evidence. This approach to 
model selection [12] proposed by the statistical physicist 
Jeffreys in 1935 [T3] balances model fidelity and complex- 
ity to determine, in this context, the number of modules. 

A more physically intuitive interpretation of the evi- 
dence is as the disorder-averaged partition function of a 
spin-glass, calculated by marginalizing over the possible 
quenched values of the parameters ■& and tt as well as the 
spin configurations a: 

Z^p{A\K) = J2 jdiT p{A,a\TfJ)p0)p{Tf){A) 

^^jdd Jdi: e~^p{S)p{n). (5) 

While the •& and tt integrals in Eqn. |4]can be performed 
analytically, the remaining sum over module assignments 
a scales as and becomes computationally intractable 
for networks of even modest sizes. To accommodate 
large-scale networks we use a variational approach that is 
well-known to the statistical physics community [14] and 
has recently found application in the statistics and ma- 
chine learning literature, commonly termed variational 
Bayes (VB) [15^. We proceed by taking the negative log- 



< dS dTT q{a,TTj)ln 



arithm of Z and using Gibbs's inequality: 

-InZ = -InV /d^-/d^,(a,^,#)^^(Mlll|^) 
J q{a,Tf,§) 

p(A,g,7r, ^\K^ ^ 
g(CT,7r,t?) 

That is, we first multiply and divide by an arbitrary 
approximating distribution q((7, tt, -d) and then upper- 
bound the log of the expectation by the expectation of 
the log. We define the quantity to be minimized - the 
expression in Eqn. |7] - as the variational free energy 
F{q; A}, a functional of q{(T, tt, -d). (Note that the nega- 
tive log of q{(j, if, 1?) plays the role of a test Hamiltonian 
in variational approaches in statistical mechanics.) 

We next choose a factorized approximating distribu- 
tion q{a,if,'d) = qa{d)qiT{if)qg{'&) with 5^(7?) = 'D{if;n) 
and qg{S) = qMqdi^d) = B{dc;~c+,~C-)B{dd;d+,d^); 
as in mean field theory, we factorize ^^(tT) as q{<Ji — fi) — 
Qi^, an A'^- by- i^T matrix which gives the probability that 
the i-th. node belongs to the /z-th module. Evaluating 
F{q; A} with this functional form for q{a, if, d) gives a 
function of the variational parameters {c+, c-,d^,d^, n} 
and matrix elements Qi^ which can subsequently be min- 
imized by taking the appropriate derivatives. 

We summarize the resulting iterative algorithm, which 
provably converges to a local minimum of F{q; A} 
and provides controlled approximations to the evidence 
p{A\K) as well as the posteriors p{if,d\A) and p{cj\A): 

Initialization. — Initialize the N-hy-K matrix Q 
Qo and set pseudocounts c+ — c+p,c_ = c_(,,(i+ — 
d+„ , d- = d-g , and = n^^ . 

Main Loop. — Until convergence in F{q; A}: 

(i) Update the expected value of the couphng constants 
and chemical potentials 



(Jl) = ^(5+)-V(£-)-V'(d+)+V'(d-) (8) 
(Jg) = ^(J_)-^(J+ + d-) 

-^(5_) + V;(S+ + 5_) (9) 



(10) 



where ipi^x) is the digamma function; 

(ii) Update the variational distribution over each spin 

Q,^ cx exp 1^ [{Jl)Aj - {Jg)] Qjm - {h^) | (11) 

normalized such that Qi^ = 1, for all i; 

(iii) Update the variational distribution over parame- 
ters from the expected counts and pseudocounts 



n„ + n„,„ = 



N 



(12) 



c+ = (c+) + £+0 = -rr(Q^AQ) + 5, 



(13) 



= -TriQ^{u{nf ~Q))-{c+) + d_, (14) 

d+ = {d+) + d+„ = M - {c+) + d+„ (15) 
J_ = =C-A/-(c_)+J_,, (16) 

where C = 7V(7V - l)/2, M ^ J2i>j and m is a A^- 
by-1 vector of I's; 

(iv) Calculate the updated optimized free energy 



F{q-K\ = -ln|^S + ^^Q,,lng,^,(17) 

where 2jf — Bin) is the beta function with a vector- 
valued argument, the partition function for the Dirichlet 
distribution 5^(7?) (likewise for qci'&c),qdii^d))- 

As this provably converges to a local optimum, VB is 
best implemented with multiple randomly-chosen initial- 
izations of Qo to find the global minimum of F{q; A}. 

Convergence of the above algorithm provides the ap- 
proximate posterior distributions qs{cT), 9^(7?), and g^(i?) 
and simultaneously returns K* , the number of non-empty 
modules that maximizes the evidence. As such, one needs 
only to specify a maximum number of allowed modules 
and run VB; the probability of occupation for extraneous 
modules converges to zero as the algorithm runs and the 
most probable number of occupied modules remains. 

This is significantly more accurate than other approx- 
imate methods, such as Bayesian Information Criterion 
(BIC) [16j and Integrated Classification Likelihood (ICL) 
[T71 [TB] , and is less computationally expensive than em- 
pirical methods such as cross-validation (CV) [121 IMj in 
which one must perform the associated procedure after 
fitting the model for each considered value of K. Specifi- 
cally, BIC and ICL are suggested for single-peaked likeli- 
hood functions well-approximated by Laplace integration 
and studied in the large- limit. For a SBM the first as- 
sumption of a single-peaked function is invalidated by the 
underlying symmetries of the latent variables, i.e. nodes 
are distinguishable and modules indistinguishable. See 
Fig. for comparison of our method with the Girvan- 
Newman modularity [S] in the resolution limit test [SI [H] , 
where VB consistently identifies the correct number of 
modules. (Note that VB is both accurate and fast: it 
performs competitively in the "four groups" test [21] and 
scales as 0{MK). Runtime for the main loop in MAT- 
LAB on a 2GHz laptop is ~6 minutes for = 10^ nodes 
with average degree 16 and K = 4.) 

Furthermore, we note that previous methods in which 
parameter inference is performed by optimizing a like- 
lihood function via Expectation Maximization (EM) 
[TTl [TS] are also special cases of the framework presented 
here. EM is a limiting case of VB in which one collapses 
the distributions over parameters to point-estimates at 
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FIG. 1: Results for the resolution limit test suggested in |9] 
and [8]. Shapes and colors correspond to the inferred modules. 
(Left) Our method, variational Bayes, in which all 15 mod- 
ules are correctly identified (each clique is assigned a unique 
color/shape). (Right) GN modularity optimization, where 
failure due to the resolution limit is observed - neighboring 
cliques are incorrectly grouped together. (Bottom) The re- 
sults of this test implemented for a range of true number of 
modules, Ktmc, the number of 4-node cliques in the ring-like 
graph. Note that our method correctly infers the number of 
communities T^vb over the entire range of KTmc, while GN 
modularity initially finds the correct number of communities 
but fails for Kttuc > 15 as shown analytically in [9]. 



the mode of each distribution; however EM is prone to 
overfitting and cannot be used to determine the appro- 
priate number of modules, as the likelihood of observed 
data increases with the number of modules in the model. 
As such, VB performs at least as well as EM while simul- 
taneously providing complexity control [221123] . 

In addition to validating the method on synthetic net- 
works, we apply VB to the 2000 NCAA American foot- 
ball schedule shown in Fig. [2] 24J. Each of the 115 
nodes represents an individual team and each of the 613 
edges represents a game played between the nodes joined. 
The algorithm correctly identifies the presence of the 12 
conferences which comprise the schedule, where teams 
tend to play more games within than between confer- 
ences, making most modules assortative. Of the 115 
teams, 105 teams are assigned to their corresponding 
conferences, with the majority of exceptions belonging 
to the frequently-misclassified independent teams [53] - 
the only disassortative group in the network. We empha- 
size that, unlike other methods in which the number of 
conferences must be asserted, VB determines 12 as the 
most probable number of conferences automatically. 

Posing module detection as inference of a latent vari- 
able within a probabilistic model has a number of advan- 
tages. It clarifies what precisely is to be optimized and 
suggests a principled and efficient procedure for how to 
perform this optimization. Inferring distributions over 
model parameters reveals the natural scale of a given 
modular network, avoiding resolution limit problems. 
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FIG. 2: Each of the 115 nodes represents a NCAA team and 
each of the 613 edges a game played in 2000 between two 
teams it joins. The inferred module assignments (designated 
by color) on the football network which recover the 12 NCAA 
conferences (designated by shape). Nodes 29, 43, 59, 60, 64, 
81, 83, 91, 98, and 111 are misclassified and are mostly inde- 
pendent teams, represented by parallelograms. 



This method allows us to view a number of approaches 
to the problem by physicists, applied mathematicians, 
social scientists, and computer scientists as related sub- 
parts of a larger problem. In short, it suggests how a 
number of seemingly-disparate methods may be re-cast 
and united. A second advantage of this work is its gen- 
eralization to other models, including those designed to 
reveal structural features other than modularity. Finally, 
use of the evidence allows model selection not only among 
nested models, e.g. models differing only in the number 
of parameters, but even among models of different para- 
metric families. The last strikes us as a natural area for 
progress in the statistical study of real- world networks. 
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Jonathan Goodman, with David Blei on variational 
methods, and with Aaron Clauset for his feedback on this 
manuscript. J.H. was supported by NIH 5PN2EY016586; 
C.W. was supported by NSF ECS-0425850 and NIH 
1U54CA121852. 
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